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ABSTRACT 


This  research  memorandum  presents  methodology  {in’ 
analysing  failures  of  machines  that  are  repeatedly  turned  on 
and  off.  Because  a  machine  can  fail  both  when  it  is  on  and 
off,  different  parametric  models  for  failure  are  used  for  each 
of  these  periods.  An  Important  issue  addressed  for  such  map 
chines  is  how  the  intermittent  use  itself  affects  failure.  Be¬ 
cause  the  models  can  predict  the  chance  of  failure  under  dif¬ 
ferent  usage  patterns,  less  harmful  usage  patterns  can  be  rec¬ 
ommended.  As  an  example,  the  models  are  applied  to  a  radar 
system,  and  both  the  immediate  and  cumulative  effects  of 
on-off  cycling  are  demonstrated. 
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INTRODUCTION 


Many  machines  are  used  intermittently,  that  is,  repeatedly  turned  on  and  off  during 
normal  operation.  Eventually,  the  machine  fails.  Following  failure,  the  machine  may 
be  repaired  or  replaced.  Failures  can  occur  either  when  a  machine  is  on  or  off — an  off 
period  might  be  followed  by  an  unsuccessful  attempt  to  turn  the  machine  on.  Examples 
include  light  bulbs,  automobiles,  and  electronic  components. 

Although  some  work  has  been  done  on  modeling  repairable  machines  1  and  2j, 
relatively  little  work  has  been  directed  towards  modeling  intermittently  used  machines. 
There  are  indications,  however,  that  on-off  cycling  has  a  harmful  effect  on  a  system’s 
reliability.  As  demonstrated  in  [3],  the  failure  rate  increases  with  on-off  cycling  for 
certain  intermittently  used  systems.  Reference  [2  suggests  that  on-off  cycling  may  be 
associated  more  strongly  with  failures  than  with  operating  time.  Modeling  failures  for 
such  machines  requires  some  modification  of  the  usual  methodology  under  which  the 
time  until  failure  is  regarded  as  a  continuous  random  variable  without  regard  to  the 
effect  of  on-off  cycling. 

In  this  analysis,  the  approach  to  modeling  machine  failure  is  to  divide  the  time- 
until-failure  into  on  and  off  periods  and  specify  different  parametric  failure  models  for 
each  period.  This  approach  allows  the  risk  of  failure  to  vary  over  on  and  off  periods 
To  estimate  the  parameters  of  these  models  requires  a  continuous  history  of  when  the 
machine  is  on.  off.  or  broken.  For  on  periods,  a  Weibull  regression  model  on  the  time 
since  repair  is  assumed  The  discovery  of  failure  following  an  off  period  is  modeled  with 
a  logit  regression.  Covariates  in  both  models  are  used  to  describe  the  operating  history 
of  the  machine.  In  this  way.  historical  usage  as  well  as  other  factors  are  allowed  to  affect 
the  chance  of  failure. 

The  models  can  describe  both  the  immediate  and  cumulative  effect  on  reliability 
of  on-off  cycling.  Cumulative  on-off  cycling  can  be  included  as  a  covariate  in  both 
models.  The  immediate  effect  is  handled  differently  for  each.  For  the  logit  model,  the 
intercept  parameter  roughly  corresponds  to  any  factor  that  occurs  with  each  off  period- 
for  example,  the  actual  shutting  off  and  turning  on.  Each  on  period  begins  by  switching 
the  machine  on.  The  risk  of  failure  during  an  on  period  is  allowed  to  vary  as  a  function  of 
the  time  since  switching  on,  by  incorporating  a  time-dependent  covariate  in  the  Weibull 
model.  The  chosen  function  exp{^)ln(u)} ,  where  u  is  the  time  since  switched  on,  allows 
risk  to  decrease,  increase,  or  remain  constant  with  u. 

t 

An  important  application  of  the  models  is  for  prediction.  Using  estimated  models, 
the  probability  of  failure  under  different  possible  usage  patterns  can  be  computed.  Given 
that  the  models  fit  reasonably  well,  these  probabilities  can  act  as  an  aid  in  choosing  how 
to  use  a  machine  under  a  certain  set  of  conditions,  with  an  eye  towards  improving 
reliability. 
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The  next  section  describes  the  type  of  data  necessary  for  estimating  the  Weibull  and 
logit  models.  The  two  models  are  developed  and  model  fit  is  discussed.  Subsequently, 
the  models  are  applied  to  a  radar  system,  and  both  the  immediate  and  cumulative 
effects  of  on-off  cycling  are  demonstrated.  The  probabilities  of  failure  under  different 
usage  patterns  are  computed  and  the  most  reliable  usage  pattern  determined.  Finally, 
generalizations  and  other  applications  of  the  methodology  are  discussed. 


MODEL 


Suppose  a  machine  is  repeatedly  switched  on  and  off.  occasionally  fails,  and  is  re¬ 
paired  after  failure.  Figure  1  provides  two  possible  on-off  patterns  that  end  in  failure 
Pattern  A  is  for  a  machine  that  follows  repair  with  a  three-day  on  period  and  a  one-day 
off  period.  At  this  point,  an  unsuccessful  attempt  was  made  to  turn  on  the  machine. 
Failure  could  have  occurred  anytime  between  days  three  and  four.  Pattern  B  is  for  a 
machine  that  is  sw  itched  on  or  off  three  times  and  fails  wrhile  working  five  days  following 
repair  The  history  for  an  intermittently  used  repairable  machine  consists  of  numerous 
on-off  patterns  like  A  and  B.  Often,  only  operating  time  is  used  to  describe  failures,  and 
the  original  patterns  A  and  B  are  transformed  into  A'  and  B'.  This  precludes  direct 
examination  of  the  effect  of  on-off  cycling  on  failure  rates.  The  approach  in  this  paper 
is  to  retain  both  the  on  and  off  periods  and  model  A  and  B  rather  than  A'  and  B'. 

Clearly  a  machine  can  fail  while  on.  but  a  machine  can  also  fail  w'hile  off.1  As  in 
pattern  A  of  figure  1.  occasionally,  an  off  period  will  be  followed  by  an  unsuccessful 
attempt  to  turn  the  machine  on  Each  off  period  involves  shutting  off.  turning  on,  and 
possible  exposure  to  such  factors  as  vibration  or  corrosion,  which  might  cause  failure.  A 
reasonable  model  should  allow  failure  to  occur  while  a  machine  is  off  as  well  as  w  hen  it 
is  on 


The  stress  a  machine  receives  while  on  is  qualitatively  different  from  the  stress  it 
receives  w  hile  off.  This  difference  suggests  that  failures  should  be  modeled  separately  for 
each  period  Apart  from  the  different  risk,  t fie  two  periods  also  differ  in  the  amount  of 
failure  information  they  provide.  If  a  machine  fails  while  on.  the  precise  time  to  failure 
is  known;  if  failure  occurs  while  off.  failure  is  only  known  to  have  occurred  sometime 
while  the  machine  was  off.  In  other  words,  for  on  periods,  the  time  of  failure  should  be 
modeled  with  a  continuous  probability  model  For  off  periods,  the  occurrence  or  not  of 
failure  should  be  modeled  by  a  discrete  0-1  probability. 

The  time  until  failure  T.  therefore,  is  modeled  as  a  combination  of  discrete  and 
continuous  components.  Let  0  =  ti  <  t?  •  •  *  denote  the  times  at  which  a  machine  is 
switched  on  or  off.  For  simplicity,  assume  that  only  on  periods  immediately  follow 
repair.  When  j  is  odd,  tj  marks  the  start  of  an  on  period  (or  end  of  an  off  period). 
When  j  is  even,  tj  marks  the  start  of  an  off  period  (or  end  of  an  on  period).  As  shown  in 
4  ,  the  survivor  function  for  a  combination  of  discrete  and  continuous  components  can 
be  written  as 


P(T  >  t)  -  exp  - 


n  *  p(j) 


‘The  “ofP  period  is  used  in  a  generic  sense  and  includes  portions  of  *he  shutting  off  or  turning  on 
sequence  when  the  system  is  partially  energized. 
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FIG.  1:  TWO  ON-OFF  PATTERNS  ENDING  IN  FAILURE 


where 


=  the  end  of  the  j th  off  period 

(the  hazard  function  while  the  machine  is  on 
(0<  /  <  t2,ort2j-i  <t<t2j  for  j  >  1) 

0  while  the  machine  is  off 

p(j)  =  probability  the  jth  off  period  ends  in  failure  . 

The  overall  model  is  specified  by  two  separate  functions.  During  on  periods,  a 
continuous  hazard  function  or  instantaneous  rate  of  failure  X(t)  is  used.  During  off 
periods,  a  discrete  hazard  or  conditional  probability  of  failure  p(j)  is  specified.  Note 
that  for  notational  convenience,  off  failures  are  ascribed  to  the  end  of  an  off  period  at 
In  actuality,  there  is  no  way  of  knowing  when  during  the  off  period  failure  occurred. 
Also  for  notational  convenience,  the  fact  that  for  the  first  on  period  the  hazard  is  defined 
at  ti  =0  will  be  suppressed. 

A  popular  choice  for  a  continuous  hazard  function,  which  incorporates  covariates,  is 
given  by  the  proportional  hazards  model  4  .  Here,  the  hazard  function  is  written  as  a 
product  of  an  exponential  function  of  covariates  times  a  baseline  hazard  function.  The 
covariates  act  multiplicatively  on  the  baseline  hazard  function  Aq(/): 

\{t  z{t)}  =  Ao(Oe*p{*(0  .-S}  •  (2) 

where 

t  =  a  time  index 

Ao(f)  =  the  baseline  hazard  rate 

z(t)  ”  a  column  vector  of  possibly  time-varying  covariates 

3  —  a  column  vector  of  covariate  parameters  . 

As  suggested  in  jlj.  the  time  index  might  be  age  of  the  machine  or,  for  repairable 
machines,  time  since  the  last  repair  or  operating  time  since  last  repair.  For  simplicity,  in 
the  remainder  of  the  paper  t  will  be  time  since  last  repair.  The  covariate  vector  c(t)  can 
be  used  to  describe  the  usage  history  or  other  aspects  of  the  machine.  While  Ao(f)  might 
be  left  unspecified,  in  this  paper  a  parametric  (Weibull) 'baseline  hazard  will  be  used. 
An  important  application  of  the  model  is  for  prediction,  and  a  well-chosen  parametric 
hazard  is  somewhat  easier  to  work  with  than  a  nonparametric  hazard. 

Although  historical  and  cumulative  usage  may  influence  the  chance  of  failure,  it  is 
also  possible  that  switching  a  machine  on  has  an  '‘immediate”  effect  on  the  failure  rate. 
The  chance  of  failure  may  be  high  wrhen  the  machine  has  just  been  switched  on  and  then 


taper  off.  or  follow  an  opposite  pattern.  This  suggests  multiplying  the  baseline  hazard 
by  a  function  of  the  time  since  on  (u),  which  can  be  either  increasing  or  decreasing.2 
One  such  function  is  exp{3oln(u)}.  With  this  function  of  u  and  the  Weibull  baseline 
hazard,  equation  2  can  be  rewritten  for  the  j th  on  period  as 


■M*  ;(<)}  =  0$tai~'txp{3oln(t  -  -  z.(t)'3.j  (3) 

*(<)}  =  W'-'u^-'expls.it)'#.)  ,  (4) 

where 


3  -  [3o,3'm] 

30  —  Q  2  ”  1 

‘•(o'  =  m«).‘.(o': 

u  =  (t  -  t2j  -i).  for  <2j  - 1  *  t  <  ti} 

Recall  that  l  marks  the  start  of  the  jth  on  period. 

When  O]  is  unity,  the  model  reduces  to  a  Weibull  with  time  index  u.  and  the  start 
of  each  on  period  resets  the  time  index  to  0.  When  02  is  unity,  a  Weibull  with  time 
index  t  results,  and.  in  this  case,  there  is  no  harmful  transient  effect  following  turning 
the  machine  on.  If  both  Qi  and  qo  differ  from  unity.  4  is.  in  effect,  the  product  of  two 
W  ei bull  models  with  different  time  scales.  Alternatively,  equation  4  can  be  viewed  as  a 
simple  Weibull  on  either  u  or  t  with  a  time-dependent  covariate,  respectively.  ln(t)  = 
ln(u  -  f 2; - 1 )  or  /n(ti)  =  (n(t  -  #2;-  1 )  Reference  4  .  pages  123-124.  provides  an  example 
of  a  similar  model  w  here  a  Weibull  shape  parameter  a  has  a  "regression”  interpretation, 
that  is,  as  a  "slope”  parameter  in  a  proportional  hazards  model. 

Figure  2  displays  the  hazard  rate  under  three  special  cases  of  this  “two- Weibull” 
model  when  z.(f)  =  0.  In  figure  2.  the  machine  is  assumed  to  be  repaired,  turned  on  for 
almost  one  day,  briefly  switched  off.  and  then  turned  on  again.  Both  models  with  02  =  .8 
display  an  increase  in  risk  following  switching  on  at  the  start  of  the  second  day.  This 
corresponds  to  a  harmful  transient  effect  following  switching  on.  For  the  Weibull  with 

"Reference  5  make?  a  similar  assumption  about  the  transient  effect  of  a  discrete  event  on  the  hazard 
rate  in  it?  example  8.y.  Immediately  following  a  heart  transplant,  the  hazard  is  multiplied  by  a 
decreasing  function  of  time. 
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time-since-repair  as  time  index  ( a\  =  .8,  <*2  =  1),  there  is  a  smoothly  decreasing  hazard, 
except  for  the  brief  off  period  at  the  end  of  day  one.  For  the  Weibull  with  time-since-on 
as  time  index  (qi  =  l,a2  =  .8),  the  time  index  is  set  back  to  zero  at  the  start  of  the 
1  second  day.  and  the  hazard  for  the  second  day  is  identical  to  that  of  the  first.  When 

both  and  <*2  differ  from  unity,  there  is  a  decreasing  hazard  following  both  switchings 
on;  however,  the  decrease  is  less  sharp  following  the  second  switching  on. 

1 

Equation  4  is  a  model  for  failures  while  the  machine  is  turned  on.  As  noted  before, 
however,  failures,  occasionally  are  discovered  at  the  end  of  an  off  period.  If  a  failure 
immediately  follows  an  off  period,  the  exact  time  of  failure  is  unknown.  The  failure  is 
known  only  to  have  occurred  sometime  during  the  off  period.  In  other  words,  there  is 
a  0-1  indicator  of  whether  or  not  failure  occurs  associated  with  each  off  period.  A  logit 
regression  model  will  be  used  to  describe  the  probability  of  failure  p{xj)  (with  a  slight 
abuse  of  notation)  for  the  jth  off  period: 


In 


p(zj) 

1  -  pOj) 


(5) 


where 

0  is  a  column  vector  of  parameters 

Xj  is  a  column  vector  of  explanatory  variables  that  describe  the  jth  off  period 

p(j;)  is  the  probability  that  the  jth  off  period  ends  in  failure. 

For  both  models,  the  covariate  vectors  might  include  fixed  attributes  of  the  machine 
(e  g.,  manufacturer)  or  descriptions  of  the  on-off  history  of  the  machine,  (e.g.,  daily  rate 
of  on-off  cycling).  For  xr  the  length  of  the  off  period  might  be  included  to  see  if  longer 
off  periods  are  more  likely  to  result  in  failure.  For  example,  in  pattern  B  of  figure  1.  one 
element  of  would  be  2. 

With  equations  4  and  5,  the  likelihood  for  any  on-off  pattern  can  be  specified.  Re¬ 
ferring  again  to  the  notional  histories  of  figure  1,  let  xf  be  the  logit  covariate  vector 
associated  with  the  off  period  of  pattern  A.  and  be  the  covariate  history  for  the  on 
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period,  with  a  similar  definition  for  pattern  B.  The  likelihood  corresponding  to  the  data 
of  figure  1  is 


e*P  -  Jo  A0  zA(t)}dt]  p(x?)x 

cxp[-  fj  A{f  ZB{t)}dt]  {1  -  p(x f )}  exp{-  /35  A{<  |  zg(t)}dt'iA{5  ,  zg(5)}  . 


In  general,  suppose  there  are  n  failures  and  that  the  on-off  pattern  for  the  ith  failure 
has  sx  off  periods  and  rx  on  periods  (st  =  r,  or  r,  -  1  ).  Let  0  =  tit\  <  ^  •  “  •  <  U,mt> 

where  m,  =  r,  —  Si  -  1,  denote  the  start  of  each  on  or  off  period  since  the  last  repair,  for 
the  ith  failure.  Following  the  convention  that  only  on  periods  immediately  follow  repair, 
the  likelihood  can  be  written  as  follows: 


nn *- j)1  w,j fi cipi_ /  i i ))}*'■• 

t=l  ;  =  1  j= 1  jA'  i 


(6) 


where 


J/ij 

d,3 


1  if  the  i,jth  off  period  ends  in  failure 
0  otherwise 

1  if  the  t\jth  on  period  ends  in  failure 
0  otherwise 

=  .L.2j-i^2.2j  time  interval  for  the  i,jth  on  period 


Note  that  for  any  i,  only  Y\tft  or  dt  Vi  (but  not  both)  can  be  1. 

The  likelihood  can  be  factored  into  the  product  of  the  “two  Weibull”  parameters  and 
the  logit  parameters.  Maximization  of  the  likelihood  can  be  accomplished  separately  for 
each  set  of  parameters.  The  logit  likelihood  is  straightforward  to  maximize.  The  two- 
Weibull  likelihood  is  a  bit  harder  since  it  involves  at  least  one  time-dependent  covariate. 
A  brief  discussion  of  likelihood  maximization  with  time-dependent  covariates  is  given  in 
5  . 


As  noted  before,  if  ct\  is  unity,  equation  4  reduced  to  a  simple  Weibull  regression 
with  time-since-on  as  the  time  index.  With  this  simplification,  the  likelihood  can  be 
maximized  using  readily  available  software.  If  some  covariate  selection  is  required,  it 
may  be  advantageous  to  use  this  simpler  model  for  exploratory  work.  In  addition, 
estimates  based  on  the  simpler  model  can  be  used  as  an  initial  guess  for  the  two- Weibull 
maximization.  For  each  model,  the  matrix  of  Fisher  information  can  be  inverted  and  the 
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diagonal  elements  used  to  determine  the  significance  of  the  individual  parameters  in  the 
usual  fashion.  Additionally,  score  or  likelihood  ratio  tests  can  be  used  to  test  hypotheses 
about  the  parameters. 
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MODEL  VALIDATION 


The  goodness  of  fit  of  the  model  can  be  assessed  by  separately  examining  the  fit 
of  the  logit  and  Weibull  models  using  generalized  residuals.  Generalized  residuals  are 
functions  of  the  data  that,  if  the  model  fits,  roughly  follow  a  known  distribution. 

Reference  6  discusses  generalized  residuals  for  logit  models.  When  the  YtJ s  are 
generally  zero,  they  suggest  grouping  the  observations  into  “cells”  based  on  similar  xt  3 
values  and  standardizing  the  number  of  failures  in  each  cell.  One  such  residual  is  based 
on  the  components  of  the  Pearson  chi-square  goodness-of-fit  statistic: 

n  -  E\Yk\ 
r*  -  v[nj<»/*>  ’ 

where 

**  - 

v  Vk  =  ^2p(x*,})  1  -  ■ 

The  sums  are  taken  over  the  k th  grouping  of  the  xx  ;s.  If  the  model  fits,  the  r*  should 
roughly  follow  a  standard  normal  distribution. 

If  the  Weibull  model  is  viewed  as  a  Weibull  with  time  index  u  (time  since  turned  on) 
and  time-dependent  covariate  /n(u t2j~ 1)1  the  i\jth  generalized  residual  can  be  defined 
as 

/■«! .)  . 

=  1  -  exp-  /  A{ti/-r  z(u  -  txil-i)}dw]  , 

JiJ 

where  utJ  =  tx2]  -  »s  the  length  of  the  on  period,  and  tx  2;-i  is  the  time-since- 

failure  at  the  start  of  the  on  period.  According  to  the  probabdity  integral  transformation, 
the  P(ul(J)s  will,  if  the  model  fits,  behave  like  a  random  sample  from  a  uniform(O.l) 
distribution  (with  some  right-censored  observations).  A  Kaplan-Meier  survival-curve 
estimate  based  on  the  F(ut  J)s  should  look  like  a  uniform(0,l)  survival  curve  if  the 
model  fits  4  or  5*. 
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EXAMPLE 


k 


One  example  of  an  intermittently  used  machine  is  a  U.S  Naval  radar.  While  at  sea. 
the  radar  is  generally  on.  While  in  port,  the  radar  is  generally  off.  Other  factors  that 
determine  usage  are  preventive  maintenance,  assurance  that  the  radar  works,  and  ship 
overhaul.  Although  operational  considerations  sometimes  require  the  radar  to  be  on  or 
off,  at  other  times  there  may  be  no  strong  requirement  for  a  radar  to  be  either  on  or  off. 
For  example,  preventive  maintenance  schedules  might  be  changed  or  assurance  checks 
might  be  done  less  frequently. 

The  U.S.  Navy  keeps  a  continuous  log  of  when  radars  are  on,  off,  or  broken.  Figure  3 
shows  an  actual  usage  history,  over  a  two-month  period,  for  a  radar  on  one  of  the  Navy’s 
ships.  Notice  that  at  some  points,  on-off  cycling  is  frequent.  Also,  some  spells  of  data  are 
missing.  Data  were  collected  on  45  radars  of  a  particular  model  for  a  two-year  period, 
yielding  a  total  of  64  unit-years  of  complete  data 

Although  several  features  of  the  usage  pattern  may  influence  failures,  this  section 
focuses  on  a  few  simple  covariates  for  exposition.  For  the  Weibull  model,  two  time-fixed 
covariates  were  defined  (they  remain  constant  over  each  on  period) —percent  time  on 
and  on-off  cycles  per  day.  Both  were  measured  from  the  last  repair  until  the  start  of 
the  on  period.  For  periods  immediately  following  repair,  both  were  set  to  zero  The  log 
of  the  time-since-on  was  used  as  a  time-dependent  covariate,  and  time-since-repair  was 
used  as  a  time  index.  For  the  logit  model,  two  additional  covariates  were  used  —length 
of  the  off  period  and  time  between  the  last  repair  and  the  start  of  the  off  period 

Table  1  presents  some  summary  statistics  of  the  data.  Note  that  failures  are  much 
more  likely  while  a  machine  is  on  than  off.  Also,  on  periods  tend  to  be  over  three  times 
as  long  as  off  periods.  For  a  handful  of  observations,  the  on-off  cycling  rate  covariates 
were  extremely  large  (time-since-repair  was  small  for  these)  and  had  a  marked  influence 
on  the  parameter  estimates.  These  observations  were  deleted  and  are  not  included  in 
table  1. 
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SAGE  HISTORY  FOR  ONE  RADAR 


TABLE  1 


MEANS,  STANDARD  DEVIATIONS,  AND  COUNTS 
OF  SOME  VARIABLES  BY  PERIOD 


Period 


On 

Off 

Length  of  period  in  days 

3.51 

(5.50)a 

1.00 

(2.37) 

Days  since  repair 

13.80 

(20.17) 

18.24 

(21.43) 

On-off  cycles  per  day  since  repair 

.72 

(1.14) 

.79 

(.82) 

Percent  time  on  since  repair 

.51 

(.39) 

.78 

(27) 

Number  of  periods 

2204 

1695 

Number  of  periods  ending  in  failure 

509 

49 

a.  Numbers  in  parentheses  are  standard  deviations. 


Table  2  presents  the  maximum  likelihood  estimates  and  standard  errors  based  on 
the  two  models.  For  the  logit  model,  only  the  length  of  the  off  period  has  a  statistically 
significant  effect — the  longer  a  machine  is  shut  off.  the  less  likely  it  is  to  work  when  an 
attempt  is  made  to  turn  it  on.  For  the  Weibull  model,  all  of  the  estimates  are  statistically 
significant.  The  signs  of  the  regression  estimates  indicate  that  (1)  machines  that  are  used 
most  of  the  time  (since  last  repair)  are  less  likely  to  fail,  and  (2)  historical  on-off  cycling 
increases  the  chance  of  failure.  The  Weibull  shape  parameters  indicate  that  risk  decreases 
following  repair  and  also  following  switching  on.  Thus,  while  a  machine  is  turned  on. 
on-off  cycling  has  two  harmful  effects:  an  immediate  higher  risk  as  described  by  <*2  and 
a  cumulative  higher  risk.  Although  when  a  machine  is  turned  off  there  is  no  estimated 
cumulative  effect  of  on-off  cycling,  each  off  period  carries  its  own  cost.  Every  time  the 
machine  is  shut  off,  there  is  a  small  probability  of  failure. 
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TABLE  2 


PARAMETER  ESTIMATES 


Model 


Effect 


Two-Weibull  Logit 


o. i  (time  since  repair) 

.878 

(.04)* 

Q2  (time  since  on) 

.871 

(.04) 

- 

Intercept 

-2.259 

(07) 

-3.883 

(.58) 

Time  since  repair 

- 

-  012 
(01) 

Length  of  off  period 

- 

.067 

(03) 

On-off  cycles  per  day  since  repair 

114 

(.04) 

-.070 

(■«> 

Percent  time  on  since  repair 

-.400 

(.15) 

665 

(60) 

a  Numbers  in  parentheses  are  the  standard  errors. 


As  an  aid  to  interpreting  the  parameter  estimates,  figure  4  combines  the  continuous 
hazard  (for  on  periods)  and  the  probability  of  failure  or  discrete  hazard  (for  off  periods) 
for  a  short  usage  pattern  In  figure  4,  the  radar  is  turned  on  and  then  shut  off  on 
alternate  days  (following  repair)  over  a  four-day  period.  The  continuous  hazard  rate 
decreases  sharply  for  the  first  day  and  decreases  less  steeply  for  the  third  day.  The 
probability  of  failure  is  quite  small  for  both  off  periods. 

Model  fit  is  assessed  using  the  techniques  discussed  in  the  previous  section.  Figure  5 
displays  a  plot  of  the  deviance  residuals  against  grouped  cell  number.  Quintiles  for 
the  first  two  covariates  (secure  period  and  time  since  failure)  were  computed,  and  each 
observation  was  grouped  into  one  of  the  cells  defined  by  the  25  pairs  of  quintiles.  Almost 
all  of  the  r* s  are  less  than  two3  in  absolute  value,  suggesting  the  logit  cannot  be  dismissed 
as  inappropriate.  Figure  6  displays  a  smoothed  plot  of  the  Kaplan  Meier  estimate  of 

"Recall  that  the  r*?  have  roughly  a  standard  normal  distribution  if  the  m-del  fits. 
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FIG.  4:  ESTIMATED  HAZARD  FUNCTION  FOR  A  DAILY-ON/DAILY-OFF  PATTERN 


the  F(uXJ)s.  The  curve  roughly  follows  that  of  a  uniform(0,l)  survival  function,  which 
indicates  the  two-Weibull  model  fits  reasonably  well. 

Having  assessed  model  fit,  a  measure  of  confidence  concerning  predictions  from  the 
model  is  warranted.  Although  caution  against  extrapolation  to  unusual  usage  patterns 
must  be  exercised,  prediction  can  help  decide  on  usage.  If  there  are  several  possible 
usage  patterns,  the  probability  of  failure  under  each  pattern  can  be  computed  and  the 
one  with  the  lowest  probability  of  failure  chosen.  As  an  illustration,  suppose  preventive 
maintenance  (PM)  over  a  two-week  period  is  planned  for  the  radar  and  that  two  programs 
are  possible:  (1)  the  radar  is  shut  off  for  one  hour  each  day  for  PM,  or  (2)  the  radar  is 
shut  off  for  two  hours  every  other  day  for  PM.  For  simplicity,  assume  that  at  all  other 
times  the  radar  is  working  and  that  the  radar  has  just  been  repaired.  In  both  cases, 
the  radar  is  on  for  all  but  14  hours;  however,  the  on-off  cycling  rate  for  (1)  is  twice 
that  of  (2).  With  bidaily  PM,  the  probability  of  no  failure  is  .29;  with  daily  PM,  the 
probability  of  no  failure  is  .19.  The  difference  in  probabilities  suggests  that  bidaily  PM 
would  produce  fewer  failures,  as  long  as  the  beneficial  effect  of  daily  versus  bidaily  PM 
was  slight . 


DISCUSSION 


This  paper  provides  a  parametric  methodology  for  modeling  failures  of  machines 
that  are  used  intermittently.  The  hazard  function  of  the  time  until  failure  is  broken  up 
into  continuous  and  discrete  components.  These  correspond,  respectively,  to  the  hazard 
during  on  and  off  periods.  Other  models  for  failures  during  these  periods  could  be  used. 
For  example,  a  probit  could  be  used  during  off  periods  or  the  baseline  hazard  could 
be  left  unspecified  during  on  periods,  yielding  the  usual  Cox  regression  model  7  .  In 
addition,  a  time  index  other  than  time  since  repair  might  have  been  chosen  or  a  function 
other  than  log  for  the  time-dependent  covariate  might  be  used.  The  point  of  this  paper 
is  to  suggest  a  type  of  methodology  for  intermittently  used  machines  where  separate 
models  are  used  for  on  and  off  periods. 

The  arguments  leading  to  the  two-Weibull  model  were  to  treat  time-since-repair  (t) 
as  a  time  index  and  use  a  function  of  the  time-since-on  (u)  as  a  time-dependent  covariate. 
The  model  was  shown  to  be  equivalent  to  a  Weibull  with  u  as  time  index  and  a  function 
of  t  a  time-dependent  covariate.  Taking  this  equivalence  one  step  further,  perhaps  it  is 
not  unreasonable  to  use  /  as  a  time-fixed  covariate.  This  results  in  a  Weibull  regression 
on  u  with  no  time-dependent  covariates  and  a  much  simpler  maximization  problem.  Such 
an  approach,  of  course,  needs  to  be  empirically  justified 

Although  the  focus  of  this  paper  has  been  on  intermittently  used  machines,  the 
methodology  could  be  applied  to  other  situations  where  the  time-until-failure  is  a  com¬ 
bination  of  discrete  and  continuous  components.  Generalization  to  more  than  two  states 
is  straightforward:  separate  failure  models  could  be  specified  for  each  state  Depending 
on  how  similar  the  different  states  are.  some  of  the  parameters  for  the  different  state 
models  might  be  assumed  to  be  equal. 
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